Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

نویسندگان

چکیده

Attention is a commonly used mechanism in sequence processing, but it of O(n^2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers computation-efficient alternative, enabling the modelling long-range dependencies O(n log n) time. model, however, quite complex, involving sophisticated gating derived from Gated Recurrent Unit. In this paper, we present simple and lightweight variant network, based on residual employing GELU Layer Normalization. proposed architecture not only scales longer sequences also converges faster provides better accuracy. It surpasses LAMBADA language task achieves state-of-the-art performance MusicNet dataset for music transcription while being efficient number parameters. We show how combine improved with convolutional layers, establishing as useful building block processing applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Strict nonblockingness of reduced shuffle-exchange networks

The shuffle-exchange network is one of the most wellstudied multistage interconnection networks. Whether a (2n 1)-stage shuffle-exchange network is rearrangeable has been a challenging conjecture for some 30 years, and only recently a proof was claimed. In this article, we use the analysis method developed for EGSN networks to show that the shuffle-exchange network can be strictly nonblocking b...

متن کامل

On the Rearrangeability of Shuffle-Exchange Networks

Let be the minimum positive integer so that the Shuffle-Exchange network with stages, inputs and outputs is rearrangeable. Beneš conjectured that . The best bounds known so far are . In this paper, we verify Beneš conjecture for , and use this result to show that . The case is considerably more complex than the case, which have been done in the literature. We believe that hidden in our proof th...

متن کامل

Simulating Shuffle–Exchange Networks with P Systems

We present in this paper a simulation with P systems of the parallel architecture known as shuffle–exchange network. This will lead us to consider a new version of P systems with communication, for which the communication graphs are not fixed, but have a dynamic evolution, and for which different communication rules can be associated to different communication graphs. The simulation of the shuf...

متن کامل

Fault-Tolerance Properties of deBruijn and Shuffle-Exchange Networks

W e study node fault-tolerance propert ies of the dary deBruij’n and shufle-exchange networks by appealing t o the algebraic structure of their underlying digraphs. In particular, we prove that both of these families of digraphs have connectivity equal t o their minimum degree. T h i s result is new in the case of the shuffle-exchange digraphs and can be extended f o r both famil ies t o charac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i8.16890